Off-line cursive handwriting recognition using synthetic training data
نویسنده
چکیده
The objective of this thesis is to investigate the generation and use of synthetic training data for off-line cursive handwriting recognition. It has been shown in many works before that the size and quality of the training data has a great impact on the performance of handwriting recognition systems. A general observation is that the more texts are used for training, the better recognition performance can be achieved. In this work it is examined whether this observation holds if the training set is augmented by synthetically generated texts. The motivation is that augmenting the training set by computer generated text samples is much faster and cheaper than collecting additional human written samples. For this purpose, two novel methods are presented for the generation of synthetic text lines. The first one is based on the geometrical perturbation of existing human written text line images. In the second method, handwritten text lines are synthesized from ASCII transcriptions, using templates of characters and the Delta LogNormal model of handwriting generation. To evaluate these two methods for synthetic training set expansion, the task of off-line cursive English handwritten text line recognition is considered. In the last part of the thesis, a novel approach for text line segmentation into individual words is presented, in order to perform segmentation-based text line recognition, which gives new insights into the effects of synthetically expanded training sets. Several configurations of the recognizer and the synthetic handwriting generation process are examined in the thesis. Based on the experimental results, it is concluded that the use of synthetic training data can lead to improved recognition performance of handwriting recognition systems.
منابع مشابه
Effects of Training Set Expansion in Handwriting Recognition Using Synthetic Data
A perturbation model for the generation of synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented. Our goal is to improve the performance of an off-line cursive handwriting recognition system by providing it with additional synthetic training data. It can be expected that by adding synthetic training data the variability of the training set ...
متن کاملGeneration of Synthetic Training Data for an HMM-based Handwriting Recognition System
A perturbation model for generating synthetic textlines from existing cursively handwritten lines of text produced by human writers is presented. Our purpose is to improve the performance of an HMM-based off-line cursive handwriting recognition system by providing it with additional synthetic training data. Two kinds of perturbations are applied, geometrical transformations and thinning/thicken...
متن کاملParameter calibration for synthesizing realistic-looking variability in offline handwriting
Being motivated by the widely accepted principle that the more training data we have, the better performance the recognition system has, we conducted experiments asking human subjects to do test on a mixture of real English handwritten textlines and textlines altered from existing handwriting with various distortion degrees. The idea of generating synthetic handwriting is based on a perturbatio...
متن کاملOff-line cursive handwriting recognition compared with on-line recognition
Off-line handwriting recognition has wider applications than on-line recognition, yet it seems to be a harder problem. While on-line recognition is based on pen trajectory data, off-line recognition has to rely on pixel data only. We present a comparison between an off-line and an on-line recognition system using the same databases and system design. Both systems use a sliding window technique ...
متن کاملChapter 0 Stroke - Based Cursive Character Recognition
In this chapter, we keep focusing on on-line writer independent cursive character recognition engine. In what follows, we explain the importance of on-line handwriting recognition over off-line, the necessity of writer independent system and the importance as well as scope of cursive scripts like Devanagari. Devanagari is considered as one of the known cursive scripts [20, 29]. However, we aim ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006